Together AI: Fine-Tuning LLMs
This notebook provides a step-by-step guide to fine-tuning Large Language Models (LLMs) using the Together AI platform. We will cover the entire workflow, from preparing your dataset to making inference calls with your new custom model.
git add -A && git commit -m "Update project ($(date -u +'%Y-%m-%dT%H:%M:%SZ'))" && git push origin main
đĄ Key Concepts in Fine-Tuning
Before we write code, let's grasp the core ideas:
Fine-Tuning: This is the process of taking a general-purpose, pre-trained LLM and further training it on a smaller, specific dataset. This adapts the model to your particular domain or task, such as a customer support chatbot or a code generator for a specific programming language.
Dataset Formatting: The quality and format of your data are critical. For instruction-based fine-tuning, you need to structure your data with clear prompts and desired responses. Together AI expects data in a JSONL format, where each line is a JSON object containing a
"text"
field.Base Model: This is the pre-trained model you start with. Your choice of base model is important. For example, a model pre-trained on chat is a better starting point for a chatbot than a raw text completion model. Together AI offers many state-of-the-art open-source models.
Hyperparameters: These are the settings for your training job, such as
learning_rate
,batch_size
, and the number ofepochs
(how many times the model sees the entire dataset). Tuning these can significantly impact your model's performance.
âď¸ 1. Setup and Installation
First, we need to install the necessary Python libraries.
# Uncomment to install the required packages
# %pip install -U together datasets transformers python-dotenv -q
Loading API Keys
We'll use the dotenv
library to securely load our Together AI API key from a .env
file. Create a file named .env
in the same directory as this notebook and add your key:
TOGETHER_API_KEY="your-together-api-key-here"
import os
import together
from dotenv import load_dotenv
load_dotenv()
# Load API keys from environment variables
os.environ["TOGETHER_API_KEY"] = os.environ.get("TOGETHER_API_KEY", "")
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = os.environ.get("HUGGINGFACE_ACCESS_TOKEN", "")
TOGETHER_API_KEY = os.environ.get("TOGETHER_API_KEY")
HUGGINGFACE_ACCESS_TOKEN = os.environ.get("HUGGINGFACE_ACCESS_TOKEN")
đ 2. Data Preparation
We will use a small sample from the databricks/databricks-dolly-15k
dataset. We'll format it into the required JSONL structure using a standard instruction template (<s>[INST]...[/INST]...</s>
) that works well with models like Llama.
import json
from datasets import load_dataset
# Load a sample of 500 examples from the dataset
dataset = load_dataset("databricks/databricks-dolly-15k", split="train", token=HUGGINGFACE_ACCESS_TOKEN).select(range(500))
def format_for_finetuning(example):
# Use a standard instruction format
return {"text": f"<s>[INST] {example['instruction']} [/INST] {example['response']} </s>"}
formatted_dataset = dataset.map(format_for_finetuning)
# Save the prepared data to a JSONL file (only the 'text' field per line)
file_name = "dolly_prepared.jsonl"
with open(file_name, 'w') as f:
for item in formatted_dataset:
f.write(json.dumps({"text": item["text"]}) + "\n")
print(f"Dataset prepared and saved to {file_name}")
print("--- Sample Entry ---")
with open(file_name, 'r') as f:
print(json.loads(f.readline())['text'])
Dataset prepared and saved to dolly_prepared.jsonl --- Sample Entry --- <s>[INST] When did Virgin Australia start operating? [/INST] Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. </s>
đ 3. Upload File & Start Fine-Tuning
Now we upload our prepared dataset and launch the fine-tuning job. We will fine-tune the togethercomputer/llama-2-7b-chat
model.
# 1. Upload the training file
try:
upload_response = together.Files.upload(file=file_name)
training_file_id = upload_response['id']
print(f"File uploaded successfully. File ID: {training_file_id}")
# 2. Create the fine-tuning job
fine_tune_response = together.Finetune.create(
training_file=training_file_id,
model='togethercomputer/llama-2-7b-chat', # Base model to fine-tune
n_epochs=3, # Number of training epochs
n_checkpoints=1, # Number of checkpoints to save
batch_size=8, # Batch size
learning_rate=1e-5, # Learning rate
suffix='dolly-llama2-7b-tutorial', # A custom name for your fine-tuned model
)
print("\nFine-tuning job created:")
print(fine_tune_response)
except Exception as e:
print(f"Error uploading file or creating fine-tune job: {e}")
/var/folders/pv/g_b0j0n53rz5fm8yrlw3jg040000gn/T/ipykernel_55499/1191766534.py:3: DeprecationWarning: Call to deprecated function upload. upload_response = together.Files.upload(file=file_name) Uploading file dolly_prepared.jsonl: 100%|ââââââââââ| 266k/266k [00:01<00:00, 212kB/s]
File uploaded successfully. File ID: file-9984a196-2c4b-4f82-b44e-da96665f34b1
/var/folders/pv/g_b0j0n53rz5fm8yrlw3jg040000gn/T/ipykernel_55499/1191766534.py:8: DeprecationWarning: Call to deprecated function create. fine_tune_response = together.Finetune.create(
Fine-tuning job created: {'id': 'ft-91b93aa1-b4dd', 'training_file': 'file-9984a196-2c4b-4f82-b44e-da96665f34b1', 'model': 'togethercomputer/llama-2-7b-chat', 'n_epochs': 3, 'n_checkpoints': 1, 'n_evals': 0, 'batch_size': 8, 'learning_rate': 1e-05, 'lr_scheduler': {'lr_scheduler_type': 'cosine', 'lr_scheduler_args': {'min_lr_ratio': 0.0, 'num_cycles': 0.5}}, 'warmup_ratio': 0.0, 'max_grad_norm': 1.0, 'weight_decay': 0.0, 'eval_steps': 0, 'training_type': {'type': 'Lora'}, 'created_at': '2025-08-05T17:54:21.187Z', 'updated_at': '2025-08-05T17:54:21.187Z', 'status': <FinetuneJobStatus.STATUS_PENDING: 'pending'>, 'events': [], 'token_count': 0, 'total_price': 0, 'wandb_base_url': '', 'wandb_project_name': '', 'wandb_name': '', 'train_on_inputs': 'auto', 'suffix': 'dolly-llama2-7b-tutorial', 'training_method': {'method': 'sft', 'train_on_inputs': 'auto'}, 'random_seed': 'null', 'max_steps': -1, 'save_steps': 0, 'warmup_steps': 0, 'validation_split_ratio': 0, 'per_device_batch_size': 0, 'per_device_eval_batch_size': 0, 'gradient_accumulation_steps': 1, 'continued_checkpoint': '', 'merge_parent_adapter': False, 'parent_ft_id': '', 'try_byoa_upload': True, 'user_id': '68920bddaeed77e341146664', 'owner_address': '0xb8e99171f6536df47bc53657526b6dbdcfbc0ee9'}
đ 4. Monitor the Fine-Tuning Job
The fine-tuning process can take some time. You can monitor its status programmatically. The job will go through queued
, running
, processing_files
, and finally completed
states.
# Wait til the fine tuning finish (could take a while)
import time
job_id = "ft-338e34c5-fdc5"
status = together.Finetune.retrieve(fine_tune_id=job_id)
job_status = status.get('status', 'unknown')
print(f"Current job status: {job_status}")
/var/folders/pv/g_b0j0n53rz5fm8yrlw3jg040000gn/T/ipykernel_55499/2545042428.py:5: DeprecationWarning: Call to deprecated function retrieve. status = together.Finetune.retrieve(fine_tune_id=job_id)
Current job status: completed
đ¤ 5. Use the Fine-Tuned Model for Inference
Once the job is complete, your model is ready! Use the new model name returned by the API for inference. Remember to use the same prompt format ([INST]...[/INST]
) that you used for training.
# This cell will only work if the monitoring step above has completed successfully.
# Get the fine-tuned model name from together fine tuning dashboard
dedicated_endpoint = 'https://api.together.ai/v1/inference/devon_a863/llama-2-7b-chat-dolly-llama2-7b-tutorial-e305d828' # FAKE dedicated endpoint for demonstration
import requests
headers = {
'Authorization': f'Bearer {os.environ.get("TOGETHER_API_KEY", "")}',
'Content-Type': 'application/json'
}
payload = {
"model": "devon_a863/llama-2-7b-chat-dolly-llama2-7b-tutorial-e305d828",
"prompt": "[INST] What is the secret to a successful startup? [/INST]",
"max_tokens": 256,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.7,
"repetition_penalty": 1.1,
"stop": ["[/INST]", "</s>"]
}
try:
response = requests.post(dedicated_endpoint, headers=headers, json=payload)
response.raise_for_status()
print("\nFake Dedicated Endpoint Response:")
print(response.json())
except Exception as e:
print(e)
Summary
This notebook demonstrates the full workflow for fine-tuning a Large Language Model (LLM) using the Together AI platform:
- Key Concepts: Introduces fine-tuning, dataset formatting, base models, and hyperparameters.
- Setup: Installs required libraries and loads API keys securely from a
.env
file. - Data Preparation: Downloads a sample from the
databricks/databricks-dolly-15k
dataset, formats it for instruction-based fine-tuning, and saves it as a JSONL file. - File Upload & Fine-Tuning: Uploads the prepared dataset to Together AI and creates a fine-tuning job using a base model (
togethercomputer/llama-2-7b-chat
). - Job Monitoring: Shows how to monitor the fine-tuning job status programmatically.
- Inference: Demonstrates how to use the fine-tuned model for inference, including handling errors and (for demonstration) how to call a fake dedicated endpoint.
This notebook provides a practical, end-to-end guide for customizing LLMs with your own data on Together AI, including robust error handling and troubleshooting tips.